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ft-  Abifcric.t 

' " The  knowledge  needed  to  process  natural  language  cones  from  many 
sources.  While  the  knowledge  itself  nay  be  broken  up  modularly*  into 
knowledge  of  syntax*  senantics*  etc.*  the  actual  processing  should  be 
completely  integrated.  This  form  of  processing  is  not  easily  amenable  to 
the  type  of  processing  done  by  serial  /von  Neumann"  computers.  This  work 
in  progress  is  an  investigation  of  the  use  of  a  highly  parallel* 
spreading  activation  and  lateral  inhibition  network  as  a  mechanism  for 
integrated  natural  language  processing.  ^ — 


A  shorter  version  of  this  paper  is  included  in  the  Proceedings  of  the 
1982  Cognitive  Science  Conference*  Ann  Arbor*  Michigan. 

This  work  was  supported  in  part  by  the  Office  of  Naval  Research  under 
contract  N00014-75-C-0612. 
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1.  Introdictlga 

It  ku  loaf  booa  thought  that  tkt  aodular  deeoaposibility  of 
knowledte  into  syntax*  saaaatioa  and  pragaatics  iaplied  that 
ilMMfg  process ins  eoald  bo  siallarly  doeoaposod;  that  natural  language 
oould  be  proeessed  by  first  parsing  tbe  syntax,  then  fleshing  out  the 
aeaaing  of  a  syntaotie  derivation  tree,  and  finally  (if  ve  oould  ever 
get  to  this  point!)  atteapting  to  interpret  the  speaker's  intentions. 
Nowadays,  it  has  becone  apparent  that  this  processing  is  integrated  in 
huaaas  [Mar slen-Vil son  1980],  and  that  it  should,  thus,  also  be  in 
eoaputer  aodels  [Schank  and  Birnbana  1980;  DeJong  1980].  However,  the 
natural  inclination  of  von  Neuaann  coaputers  to  run  one-step  at  a  tiae 
presents  a  severe  roadblock  to  the  kind  of  integration  needed  for  NLP. 

That  is  needed  is  an  integration  aechanisa  sensitive  to 
interpretation  pressures  froa  several  directions.  A  proaising  approach 
would  seen  to  be  the  use  of  a  quantitative  spreading  activation  / 
lateral  inhibition  network.  This  kind  of  network,  siailar  in  function  to 
relaxation  techniques  for  low-level  vision,  and  to  neural  network 
aodels,  works  through  the  iterative  adjustaent  of  real-valued  node 
weights . 

2.  Px9.Tl91l  lid  Xsxk 

The  tern  "spreading  activation"  is  alaost  as  overworked  as  the  tens 
"fraae,"  but  aost  systeas  which  spread  activation  do  it  in  one  of  two 
ways:  As  aarker  nassiaa  intersection  search  [Qnillian,  1968;  Collins 
and  Qnillian  1972;  Pahlaan  1980],  in  which  a  parallel  intersection 
search  is  siaulated  by  binary  narking  of  adjacent  nodes  in  a  breadth- 
first  aanner,  or  as  quantitative  weiaht  balancina.  [Ortony,  1976; 
McClelland  and  Runelhart  1980],  in  which  activation  energies  assigned  to 
all  nodes  are  iteratively  adjusted,  based  on  local  activation  energies 
and  strength  of  connections.  One  of  the  well-known  dangers  of  spreading 
activation  is  its  potential  for  overkill;  an  intersection  search,  under 
certain  circnastances,  nay  generate  too  aany  useless  intersections,  and 
quantitative  adjustaent  aay  result  in  "heat  death,"  where  every  node 
becoaes  activated.  (A  solution  for  this  latter  fora  of  activation 
involves  the  use  of  decay,  daapening  faotors,  or  the  spread  of  negative 
energy  -  lateral  inhibition.)  Nonetheless,  both  foras  of  spreading 
activation  display  interesting  behavior. 

For  exaaple,  the  previously  aentioned  work  by  Collius  and  Qnillian 
showed  how  spreading  activation  could  account  for  aspects  of  hnaan 
aeaory  priaing,  while  Fahlaan's  work  denonstrated  that  aany  foras  of 
problea  solving  could  be  siaplified  and  speeded  up  when  intersection 
search  was  coaputationally  inexpensive.  Ortony,  on  the  other  hand,  bnilt 
a  systea  for  scheaa  selection  using  daaped  activation,  and  McClelland 
and  Kuaelhart  effected  a  close  siaulation  of  experiaental  results  on 
hnaan  letter  and  word  perception  in  context. 

Other  work  in  parallel  approaches  to  natural  language  processing 
has  been  done  by  Snail  [1981]  and  kieger  [1977]  where  the  traditional 
practice  of  breaking  down  knowledge  into  syntax  and  seaantics  was  turned 
on  its  head,  and  knowledge  of  all  kinds  was  distributed  to  individual 
"word  experts";  by  Headier  and  Phillips  [1981]  who  are  working  on  an 
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ACTOR-based  [Hewitt, 1976]  NLP  systea;  aad  by  Gigley,  [1982]  who  has 
built  a  neurolinguistically-inspired  NLP  systea  eapable  of  siaulatiag 
aphasie  behavior. 

i.  HLE  us  ins  SaXMMiiUM.  AfitlMlASH  Lateral  Inhibition 

The  authors  of  this  paper  are  presently  building  a  NLP  systea  in 
which  the  knowledge  sources  are  aodular,  but  the  processing  is  fully 
integrated.  The  knowledge  is  represented  in  a  seaantic  network  where  the 
nodes  represent  concepts  and  the  links  represent  binary  relations.  The 
integration  aechanisa  is  an  activation/ inhibition  network  siailar  in 
nature  to  the  one  used  by  McClelland  and  Rnaelhart  and  described  below. 
Processing  takes  plaee  as  (word)  input  causes  the  creation  of  an 
unstable  network  of  possibilities  while  activation  and  inhibition  sift 
and  stabilise  the  network  such  that  the  "best"  interpretation  is 
highlighted. 

1.1.  AftUflUfttt  lfid.  Inhibition 

An  activation/ inhibition  network  is  a  weighted  directed  graph, 
where  node  weights,  represent  activation  levels,  and  link 
weights,  Lj. ,  represent  strength  of  activation  (if  positive)  or  of 
inhibition  J(if  negative).  The  processes  of  spreading  activation  and 
lateral  inhibition  involve  the  iterative  recoaputation  of  the  activation 
level  for  each  node  based  on  its  weighted  connections.  At  each  cycle  v, 
every  node  receives  a  contribution  frou  each  of  its  neighboring  nodes 
equivalent  to  the  neighbor's  activation  level  aultiplied  by  the  weight 
of  the  intervening  link: 


<:,(»)  - 1  tjWly 

J 

This  contribution  (scaled  to  range  between  -1  and  1)  causes  a 
proportional  change  in  the  activation  level  of  the  node  for  the  nest 
iteration: 


WjU+l)  -  Wt(x)  *  Ms(Ci(T)vO)*(lMri(v))  +  uin^vJ.OJ-CWjOO-u) 

So  a  contribution  of  1  saps  the  node  up  to  its  aaxiaua  activation  level, 

M,  while  a  contribution  of  -1  saps  the  node  down  to  its  ainiaua,  a. 
Eventually,  a  static  condition  is  reached  where  soae  nodes  reach  their 
aaxiaua  or  ainiaua  strength,  while  the  rest  of  thea  receive 
contributions  of  0,  when  the  positive  and  negative  contributions 
balance. 

1-2-  NftiXfiXk  Construction 

An  activation/ inhibition  network  sueh  as  this  can  saoothly  aodel 
the  flow  of  quantitative  constraints  up  and  down  a  aultilevel  systea. 

For  natural  language  processing,  the  aain  problea  becoaes  how  to  build 
such  a  aultilevel  network.  Te  feel  that  a  proper  network  can  be  built 7~T 
through  the  judicious  instantiation  of  network  fragaents  which  are 


fraaes  [Minsky,  1975] .  ^ — v. 

1  and/or 

/  c  v  A 

Dist 

Special 

«  if 
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The  f rasas  in  onr  spa tan  contain  th«  know ledge  of  syntax*  of 
seaantic  features*  and  of  ease  roles*  organised  to  efficiently  generate 
pieces  of  network  on  deaand.  These  f rases  are  riekly  interconnected  witk 
activation  and  inhibition  links*  and  constitute  the  general  knowledge 
base  of  the  systea.  When  sentences  are  input*  a  temporary  network  is 
constructed  ont  of  fragaents  stored  within  lexically  accessed  fraaes. 
These  fragaents  are  organized  into  a  network  by  the  saae  sort  of 
breadth-first  operation  used  in  a  chart  parser  [Kay*  1973].  The 
resulting  network*  for  instance*  has  activation  links  between  phrase 
aarkers  end  their  constitnents  and  between  oase  roles  and  their  fillers* 
and  inhibition  links  between  pairs  of  phrases  that  have  coaaon 
constitnents  and  ease  roles  with  coaaon  fillers. 

In  aore  deteil*  the  required  setions  ere  es  follows: 

First*  there  is  breadth-first  instantiation  of  nodes  representing 
phrase  Barkers*  case  roles*  and  expectations  for  other  nodes.  These 
expectations  are  triggered  when  lexical  iteas  or  graaaatical 
constitnents  are  encountered*  and  consist  of  siaple  feature  patterns  to 
aatch  and  connection  procedures  to  be  carried  out  if  the  aatch  occurs. 
Secondly,  there  is  pattern-based  connection  whereby  if  a  newly 
instantiated  node  Batches  a  pattern,  specific  linkages  are  aade.  As  an 
exaaple  of  these  these  two  processes,  if  a  node  of  type  NP  is 
instantiated*  it  will  then  cause  the  instantiation  of  an  expectation 
that  a  VP  will  occur;  if  a  VP  is  found*  an  S  is  generated  and  connected 
to  both  the  NP  and  VP.  Of  course*  if  aore  than  one  candidate  for  a 
pattern  shows  up,  the  two  candidates  are  connected  with  an  inhibition 
link,  so  that  one  will  eventually  be  eliainated. 

The  activation  gal  inhibition  processes  reinforce  nodes  that  are 
well  supported  by  activation  links  sad  inhibit  those  which  are  not*  so, 
for  exaaple,  expectations  which  are  not  quickly  fulfilled  will  die. 
Furtheraore,  activation  and  inhibition  are  also  happening  in  the 
background  fraae  systea  by  a  purely  word  associative  scheae*  which  helps 
priae  good  word  senses  (and  aids  in  scheaa  selection).  Finally,  nodes 
which  becoae  inhibited  below  a  certain  point  are  aarbaae  collected  thus 
keeping  the  active  network  as  saall  as  possible. 

1.1.  Bmgld  al  Operation 

Soae  preliminary  results  are  presented  here  which  demonstrate  the 
feasibility  of  this  activation/ inhibition  approach  to  NLP.  However* 
since  the  systea  is  in  its  early  stages*  the  networks  presented  were 
built  by  hand.  We  demonstrate  how  the  systea  reacts  to  syntactic 
ambiguity,  how  a  lexical  preference  can  affect  its  behavior*  and  finally 
how  seaantic  constraints  can  be  integrated. 

Consider*  then*  the  following  sentence,  which,  in  the  absence  of 
any  seaantic  knowledge*  is  syntactically  aabiguous  due  to  the  lexical 
aabiguity  of  "up": 


John  ate  up  the  street. 

The  hand-built  network  for  this  sentence  is  shown  in  figure  1  with 
arrows  denoting  activation  links,  and  oireles  den* ♦ing  inhibition  links 
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Not*  that  each  aoda  in  this  network  is  saffizad  by  two  aaabars  whieh 
daaota  the  "span”  [Hobbs,  1974],  or  sequence  of  words,  of  that  aoda. 

Oat  woald  expect  a  robust  NLP  system  to  be  eoafused  by  ambiguity 
but  thea  to  gracefully  resol ye  it.  This  is  iadeed  what  happeas.  Figure  2 
oontaias  a  graph  of  the  aetiwatioa  level a  over  time  for  all  the  aodes  ia 
the  aetwork.  Each  aoda  is  depicted  by  a  siagle  letter,  aad  each 
activatioa  cycle  by  a  horizoatal  row  ia  the  graph.  Vhea  a  letter  traces 
a  path  to  the  left,  it  is  beiag  inhibited  sad  when  it  moves  to  the 
right,  it  is  beiag  activated. 

The  most  iaterestiag  node  pairs  to  watch  are  B  and  C,  the  mutually 
inhibitory  sentences,  aad  0  and  F,  the  mutually  inhibitory  verb  phrases: 

B*(John)  (ate  (up  the  street)) 

O(John)  (ate  up)  (the  street) 

G**(ate  (up  the  street)) 

F-Ute  up) 

The  system  is  confused  at  first:  B  is  more  heavily  weighted  than  C, 
so  the  sentence  with  the  preposition  is  selected,  while  F  is  more 
stroagly  activated  than  6,  so  the  verb-particle  phrase  is  selected.  This 
selection  is,  obviously,  inconsistent.  But  then,  after  about  30  cycles, 
the  system  "decides”  ("Look  Ha,  no  homunculus!”)  on  a  consistent  reading 
of  ”up”  as  a  preposition,  and  weights  0  more  heavily  than  F. 

In  the  absence  of  semantic  preferences  (e.g.  a  preference  for 
interpreting  "street”  as  a  location),  syntactic  preferences  oan  play  a 
role.  Certain  words  have  lexical  tendencies,  as,  for  instance,  the 
word  "does”,  which  is  most  often  a  verb,  but  which  is  also  a  plural 
noun,  meaning  several  female  deer. 

Figure  3  demonstrates  the  sensitivity  of  an  activation/inbibition 
network  to  syntactic  preferences.  The  link  strength  from  "up”  to 
"particle”  has  been  increased,  corresponding  to  a  lexical  preference. 
Notice  that  the  phrases  related  to  interpreting  "up”  as  a  preposition 
(B,  6,  J,  and  P)  become  inhibited  much  more  quickly  this  time. 

However,  when  humans  process  this  sentence,  they  also  take  into 
account  the  knowledge  that  "street”  is  a  good  candidate  for  a  location, 
but  a  bad  candidate  for  the  object  of  eating.  The  next  example 
demonstrates  the  sensitivity  of  our  NLP  approach  to  this  semantic 
knowledge.  Four  nodes  have  been  added  and  connected  into  the  network. 
The  verb  phrase  "ate”  is  linked  to  "ate-loc”  and  "ate-obj,”  and  the  verb 
phrase  "ate  op”  is  linked  to  ”ate-up-loc”  and  "ate-up-obj .”  These  nodes 
represent  "cases”  [Fillmore,  1958]  of  their  respective  nodes  and  are  a 
subset  of  those  that  would  be  instantiated  by  our  system.  The  pattern¬ 
matching  connection  component  would  connect  the  prepositional  phrase  "up 
the  street"  to  "ate-loc"  based  on  its  span  and  on  inherited  features 
from  "up”  and  "street". 

The  modified  network  is  shown  in  figure  4,  and  figure  5  graphs  the 
response  of  the  activation/inbibition  network  to  this  new  information. 
As  one  can  see,  after  15  cycles,  all  nodes  related  to  interpreting  "up" 
as  a  particle  are  being  rapidly  inhibited.  (T,  S,  C,  F,  and  I). 
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1.  Prowieti 

Ik«  results  given  above  srs  interesting  in  that  they  deaonstrate 
tbs  sensitivity  of  activation/ inhibition  networks  to  slight  differences 
in  knowledge.  Currently  we  are  working  to  coaplete  the  sutoastie 
instantiation  end  connection  coaponents  of  the  systea. 

The  use  of  s  parallel  and  decentralised  decision  process  can  be 
brought  to  bear  on  aany  other  interesting  problems  in  NLP  as  well.  For 
instance*  there  are  indications  that  the  tiaing  and  volnae  of  spoken 
language  both  play  useful  roles  in  disaabiguation  [Tales  and  Toner* 

1979]  .  A  systea  based  on  activation  and  inhibition  could  be  designed  for 
sensitivity  to  these  clues*  since  tiae  is,  after  all,  a  erueial  eleaent 
in  the  activation/ inhibition  process. 

Furtheraore,  the  processing  of  garden  path  sentences,  which  are  an 
interesting  but  not  well-understood  phenoaenon  in  natural  language, 
could  quite  possibly  be  handled  by  an  activation/ inhibition  network. 
Marcus  [1979]  built  a  parser  which  atteapted  to  account  for  garden-path 
sentences  ss  a  result  of  aeaory  liaitatioas.  Unfortunately,  there  are 
garden  path  sentences  his  parser  could  (though  shouldn't)  handle  [Milne, 

1980] ,  such  as: 


Ike  prime  number  few. 

Within  the  fraaework  of  activation/ inhibition  networks,  garden  path 
sentences  would  be  accounted  for  by  irreversible  inhibition  of 
expectations. 

Also  we  have  recently  begun  to  consider  ways  of  integrating  a  novel 
fora  of  knowledge  representation,  "event  shape  diagrams”  [Walts  1982], 
to  aodel  certain  kinds  of  aetsphor  understanding  and  adverbial 
aodification.  As  an  exaaple,  these  aethods  should  allow  us  to  interpret 
sentences  such  as: 

Bobbie's  aetal  legs  ate  up  the  space  between  hiaself  and  Susie. 

as  aeaning  a  kind  of  PTXANS  [Sehank  1975]. 

Finally,  a  practical  systea  based  on  activation/inhibition  networks 
could  be  the  starting  point  for  new  eoaputing  architectures.  In  this 
vein,  [Pollack,  1982]  has  designed  a  VLSI  cell  for  parallel  simulation 
of  activation/ inhibition  networks,  thus  showing  that  a  prograaaable  set 
of  logical  connections  (i.e.  links)  can  be  run  on  a  aaehine  with  fixed 
and  regular  physical  connections  (i.e.  wires). 

1.  CgatUttlBK  Work 

There  sre  aany  areas  of  this  research  which  need  further 
definition.  We  are  currently  working  to  aore  fully  understand  the 
nature  and  behavior  of  these  networks,  ss  well  ss  to  develop  a 
aethodology  of  assigning  weights  to  nodes  and  links.  Also,  there  is  a 
correspondence  between  the  decisions  being  made  via 
activation/ inhibition  networks,  and  the  work  done  in  belief  aaintenance 
systeas  [Doyle,  1978],  and  we  are  trying  to  precisely  define  this 
correspondence . 
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Figure  1  -  Syntax  Activation/ Inhibition  Network 
For  "John  Eats  Up  The  Street" 


iParta  Is  shown  as  I) 
,preg23  is  shewn  as  J) 
,the34  is  shown  as  K) 
,det34  is  shewn  as  L) 
i  streets  is  shewn  as  H) 
n45  1s  shown  as  N) 
rp35  is  shown  as  0) 
ppffi  is  shown  as  P) 


Figure  4  -  Semantically  Augmented  Network 


